COMPARISON OF IMAGE SEGMENTATION METHOD IN IMAGE CHARACTER EXTRACTION PREPROCESSING USING OPTICAL CHARACTER RECOGINITON
نویسندگان
چکیده
Today, there are many documents in the form of digital images obtained from various sources which must be able to processed by a computer automatically. One document image processing is text feature extraction using OCR (Optical Character Recognition) technology. However, cases technology unable read characters accurately. This could due several factor such as poor quality or noise. In order get accurate result, good quality, so that need preprocessed. The preprocessing method used this study Otsu Thressholding Binarization, Niblack, and Sauvola methods. While extract character Tesseract library Python. test results show direct original gives better with match rate average 77.27%. Meanwhile, was 70.27%, 69.67%, Niblack only 35.72%. some research methods give results.
منابع مشابه
Image preprocessing for optical character recognition using neural networks
Primary task of this master’s thesis is to create a theoretical and practical basis of preprocessing of printed text for optical character recognition using forward-feed neural networks. Demonstration application was created and its parameters were set according to results of realized experiments. Project definition and task determination 1. Write a introduction about the problematics of optica...
متن کاملImage Thresholding for Optical Character Recognition and Other Applications Requiring Character Image Extraction
Two new, cost-effective thresholding algorithms for use in extracting binary images of characters from machineor hand-printed documents are described. The creation of a binary representation from an analog image requires such algorithms to determine whether a point is converted into a binary one because it falls within a character stroke or a binary zero because it does not. This thresholding i...
متن کاملStroke Extraction from Gray-Scale Character Image
In this paper, a topographic feature classification method based on 4directional scanning, and a stroke extraction method from skeletal pixels are proposed. Combination of the proposed methods is relatively fast and the resulting strokes are more acceptable.
متن کاملOptical Character Recognition from Text Image
Optical Character Recognition (OCR) is a system that provides a full alphanumeric recognition of printed or handwritten characters by simply scanning the text image. OCR system interprets the printed or handwritten characters image and converts it into corresponding editable text document. The text image is divided into regions by isolating each line, then individual characters with spaces. Aft...
متن کاملImage Normalization and Preprocessing for Gujarati Character Recognition
Pattern recognition has been an important area in computer vision applications. In the case of a planar image, there are four basic forms of geometric distortion caused by the change in camera location: translation, rotation, scaling and skew. So far, a number of methods have been developed to solve these distortions, such as moment invariants’, Fourier descriptor, Hough transformation, shape m...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Jurnal Teknik Informatika
سال: 2023
ISSN: ['1979-9160', '2549-7901']
DOI: https://doi.org/10.52436/1.jutif.2023.4.3.956